Removing Statistical Biases in Unsupervised Sequence Learning
نویسندگان
چکیده
Unsupervised sequence learning is important to many applications. A learner is presented with unlabeled sequential data, and must discover sequential patterns that characterize the data. Popular approaches to such learning include statistical analysis and frequency based methods. We empirically compare these approaches and find that both approaches suffer from biases toward shorter sequences, and from inability to group together multiple instances of the same pattern. We provide methods to address these deficiencies, and evaluate them extensively on several synthetic and real-world data sets. The results show significant improvements in all learning methods used.
منابع مشابه
Removing biases in unsupervised learning of sequential patterns
Unsupervised sequence learning is important to many applications. A learner is presented with unlabeled sequential data, and must discover sequential patterns that characterize the data. Popular approaches to such learning include (and often combine) frequency-based approaches and statistical analysis. However, the quality of results is often far from satisfactory. Though most previous investig...
متن کاملUnsupervised Learning
Unsupervised learning studies how systems can learn to represent particular input patterns in a way that reflects the statistical structure of the overall collection of input patterns. By contrast with SUPERVISED LEARNING or REINFORCEMENT LEARNING, there are no explicit target outputs or environmental evaluations associated with each input; rather the unsupervised learner brings to bear prior b...
متن کاملFrom unsupervised learning to data mining: linking cognition and data analysis
Recently, Knowledge Discovery on Databases (KDD) has emerged as a promising research area encompassing methods from several disciplines. Particularly, the data mining step of KDD shares most of its goals with unsupervised learning. But data mining methods are biased towards statistical techniques arguing that Machine Learning (ML) methods are not suitable to deal with real-world databases. We c...
متن کاملA Survey of Inductive Biases for Factorial Representation-Learning
With the resurgence of interest in neural networks, representation learning has re-emerged as a central focus in artificial intelligence. Representation learning refers to the discovery of useful encodings of data that make domain-relevant information explicit. Factorial representations identify underlying independent causal factors of variation in data. A factorial representation is compact an...
متن کاملApplication of Feature Selection for Unsupervised Learning in Prosecutors' Office
Feature selection is effective in removing irrelevant data. However, the result of feature selection in unsupervised learning is not as satisfying as that in supervised learning. In this paper, we propose a novel methodology ULAC (Feature Selection for Unsupervised Learning Based on Attribute Correlation Analysis and Clustering Algorithm) to identify important features for unsupervised learning...
متن کامل